AITopics | different llm

Collaborating Authors

different llm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Smoothie: Label Free Language Model Routing

Neural Information Processing SystemsMar-22-2026, 18:35:57 GMT

Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks. Recent work has found that the choice of LLM is consequential, and different LLMs may be good for different input samples. Prior approaches have thus explored how engineers might select an LLM to use for each sample (i.e.). While existing routing methods mostly require training auxiliary models on human-annotated data, our work explores whether it is possible to perform routing. We propose Smoothie, a weak supervision-inspired routing approach that requires no labeled data. Given a set of outputs from different LLMs, Smoothie constructs a latent variable graphical model over embedding representations of observable LLM outputs and unknown "true" outputs. Using this graphical model, we estimate sample-dependent quality scores for each LLM, and route each sample to the LLM with the highest corresponding score. We find that Smoothie's LLM quality-scores correlate with ground-truth model quality (correctly identifying the optimal model on 9/14 tasks), and that Smoothie outperforms baselines for routing by up to 10 points accuracy.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

AT ask Level Case Study

Neural Information Processing SystemsFeb-11-2026, 17:35:56 GMT

This section illustrates how a model's performance may vary across different tasks associated with We analyzed the performance of Llama-3-Instruct-70B on the new term "wokely," The book's cover was described as wokely by several reviewers. A. it struggled to attract attention on the bookstore displays despite a B. many readers were enticed to buy it, strengthening its presence on C. readers were intrigued and the book's sales experienced an unexpected surge worldwide. D. the publisher decided to release a limited edition with a special In the previous sentence, does _ refer to A. Is this example in line with commonsense and grammatically correct? As observed, the model only answered correctly in the COMA task but failed in the other two tasks. In the COMA task, the model successfully inferred that "wokely" carries a negative connotation, Although the phrase "hard to find a satisfying These results provide a comprehensive evaluation of the model's understanding of the term "wokely."

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

ROBOPSY PL[AI]: Using Role-Play to Investigate how LLMs Present Collective Memory

Jahrmann, Margarete, Brandstetter, Thomas, Glasauer, Stefan

arXiv.org Artificial IntelligenceOct-14-2025

The paper presents the first results of an artistic research project investigating how Large Language Models (LLMs) curate and present collective memory. In a public installation exhibited during two months in Vienna in 2025, visitors could interact with five different LLMs (ChatGPT with GPT 4o and GPT 4o mini, Mistral Large, DeepSeek-Chat, and a locally run Llama 3.1 model), which were instructed to act as narrators, implementing a role-playing game revolving around the murder of Austrian philosopher Moritz Schlick in 1936. Results of the investigation include protocols of LLM-user interactions during the game and qualitative conversations after the play experience to get insight into the players' reactions to the game. In a quantitative analysis 115 introductory texts for role-playing generated by the LLMs were examined by different methods of natural language processing, including semantic similarity and sentiment analysis. While the qualitative player feedback allowed to distinguish three distinct types of users, the quantitative text analysis showed significant differences between how the different LLMs presented the historical content. Our study thus adds to ongoing efforts to analyse LLM performance, but also suggests a way of how these efforts can be disseminated in a playful way to a general audience.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.09874

Country: Europe > Austria > Vienna (0.36)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

3eec719ab86712d32b065c5977f94ad0-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 00:13:49 GMT

benchmark, llm, new term, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.46)

Industry:

Retail (0.46)
Information Technology (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Instance-level Randomization: Toward More Stable LLM Evaluations

Li, Yiyang, Wu, Yonghuang, Luo, Ying, Sun, Liangtai, Qin, Zishu, Qiu, Lin, Cao, Xuezhi, Cai, Xunliang

arXiv.org Artificial IntelligenceSep-17-2025

Evaluations of large language models (LLMs) suffer from instability, where small changes of random factors such as few-shot examples can lead to drastic fluctuations of scores and even model rankings. Moreover, different LLMs can have different preferences for a certain setting of random factors. As a result, using a fixed setting of random factors, which is often adopted as the paradigm of current evaluations, can lead to potential unfair comparisons between LLMs. To mitigate the volatility of evaluations, we first theoretically analyze the sources of variance induced by changes in random factors. Targeting these specific sources, we then propose the instance-level randomization (ILR) method to reduce variance and enhance fairness in model comparisons. Instead of using a fixed setting across the whole benchmark in a single experiment, we randomize all factors that affect evaluation scores for every single instance, run multiple experiments and report the averaged score. Theoretical analyses and empirical results demonstrate that ILR can reduce the variance and unfair comparisons caused by random factors, as well as achieve similar robustness level with less than half computational cost compared with previous methods.

large language model, natural language, random factor, (17 more...)

arXiv.org Artificial Intelligence

2509.12678

Country:

North America > United States (0.68)
Europe (0.68)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units

Hao, Chao, Wang, Zezheng, Huang, Yanhua, Xu, Ruiwen, Niu, Wenzhe, Liu, Xin, Yu, Zitong

arXiv.org Artificial IntelligenceAug-27-2025

This paper investigates the enhancement of reasoning capabilities in language models through token-level multi-model collaboration. Our approach selects the optimal tokens from the next token distributions provided by multiple models to perform autoregressive reasoning. Contrary to the assumption that more models yield better results, we introduce a distribution distance-based dynamic selection strategy (DDS) to optimize the multi-model collaboration process. To address the critical challenge of vocabulary misalignment in multi-model collaboration, we propose the concept of minimal complete semantic units (MCSU), which is simple yet enables multiple language models to achieve natural alignment within the linguistic space. Experimental results across various benchmarks demonstrate the superiority of our method. The code will be available at https://github.com/Fanye12/DDS.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.18763

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

Chen, Xin, Wu, Junchao, Yang, Shu, Zhan, Runzhe, Wu, Zeyu, Luo, Ziyang, Wang, Di, Yang, Min, Chao, Lidia S., Wong, Derek F.

arXiv.org Artificial IntelligenceAug-19-2025

Detecting content generated by large language models (LLMs) is crucial for preventing misuse and building trustworthy AI systems. Although existing detection methods perform well, their robustness in out-of-distribution (OOD) scenarios is still lacking. In this paper, we hypothesize that, compared to features used by existing detection methods, the internal representations of LLMs contain more comprehensive and raw features that can more effectively capture and distinguish the statistical pattern differences between LLM-generated texts (LGT) and human-written texts (HWT). We validated this hypothesis across different LLMs and observed significant differences in neural activation patterns when processing these two types of texts. Based on this, we propose RepreGuard, an efficient statistics-based detection method. Specifically, we first employ a surrogate model to collect representation of LGT and HWT, and extract the distinct activation feature that can better identify LGT. We can classify the text by calculating the projection score of the text representations along this feature direction and comparing with a precomputed threshold. Experimental results show that RepreGuard outperforms all baselines with average 94.92% AUROC on both in-distribution (ID) and OOD scenarios, while also demonstrating robust resilience to various text sizes and mainstream attacks. Data and code are publicly available at: https://github.com/NLP2CT/RepreGuard

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.13152

Country:

Asia (1.00)
North America > United States (0.93)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Data Curation Matters: Model Collapse and Spurious Shift Performance Prediction from Training on Uncurated Text Embeddings

Mattioli, Lucas, Hadichou, Youness Ait, Chaouche, Sabrina, Gonzalez, Martin

arXiv.org Artificial IntelligenceJun-24-2025

Training models on uncurated Text Embeddings (TEs) derived from raw tabular data can lead to a severe failure mode known as model collapse, where predictions converge to a single class regardless of input. By comparing models trained with identical hyper-parameter configurations on both raw tabular data and their TE-derived counterparts, we find that collapse is a consistent failure mode in the latter setting. We introduce a set of metrics that capture the extent of model collapse, offering a new perspective on TE quality as a proxy for data curation. Our results reveal that TE alone does not effectively function as a curation layer - and that their quality significantly influences downstream learning. More insidiously, we observe that the presence of model collapse can yield artificially inflated and spurious Accuracy-on-the-Line correlation. These findings highlight the need for more nuanced curation and evaluation of embedding-based representations, particularly in out-of-distribution settings.

data quality, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.17989

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Rethinking the Understanding Ability across LLMs through Mutual Information

Wang, Shaojie, Ding, Sirui, Zou, Na

arXiv.org Artificial IntelligenceJun-2-2025

Recent advances in large language models (LLMs) have revolutionized natural language processing, yet evaluating their intrinsic linguistic understanding remains challenging. Moving beyond specialized evaluation tasks, we propose an information-theoretic framework grounded in mutual information (MI) to achieve this. We formalize the understanding as MI between an input sentence and its latent representation (sentence-level MI), measuring how effectively input information is preserved in latent representation. Given that LLMs learn embeddings for individual tokens, we decompose sentence-level MI into token-level MI between tokens and sentence embeddings, establishing theoretical bounds connecting these measures. Based on this foundation, we theoretically derive a computable lower bound for token-level MI using Fano's inequality, which directly relates to token-level recoverability-the ability to predict original tokens from sentence embedding. We implement this recoverability task to comparatively measure MI across different LLMs, revealing that encoder-only models consistently maintain higher information fidelity than their decoder-only counterparts, with the latter exhibiting a distinctive late-layer "forgetting" pattern where mutual information is first enhanced and then discarded. Moreover, fine-tuning to maximize token-level recoverability consistently improves understanding ability of LLMs on tasks without task-specific supervision, demonstrating that mutual information can serve as a foundation for understanding and improving language model capabilities.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.2379

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

different llm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Smoothie: Label Free Language Model Routing

b3c318cd7ee132d8a6b1895a2d6436c7-Supplemental-Datasets_and_Benchmarks_Track.pdf

AT ask Level Case Study

ROBOPSY PL[AI]: Using Role-Play to Investigate how LLMs Present Collective Memory

3eec719ab86712d32b065c5977f94ad0-Supplemental-Datasets_and_Benchmarks_Track.pdf

Instance-level Randomization: Toward More Stable LLM Evaluations

Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units

RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns

Data Curation Matters: Model Collapse and Spurious Shift Performance Prediction from Training on Uncurated Text Embeddings

Rethinking the Understanding Ability across LLMs through Mutual Information